Increased complexity and flexibility in ecological data modeling:
Generalized linear modes (GLMs)
Mixture models (e.g. zero-inflated GLMs)
Hiearchical/Multilevel models, GLMMs
But still few tools for model diagnostics
Problem: failing to check model assumptions
Can you trust your model?
Example count data:
Modeling count data, GL(M)M distributions:
Poisson
Binomial (K/N) proportion
Problem when data has more or less variability than expected by the distribution used for modeling:
UNDER or OVERDISPERSION
DHARMa
“Real” overdispersion:
More variance that expected by the model.
Heteroscedasticity:
Zero-inflation:
“Real” overdispersion:
More variance that expected by the model.
Heteroscedasticity:
Variance increases/ decreases with a predictor.
Zero-inflation:
“Real” overdispersion:
More variance that expected by the model.
Heteroscedasticity:
Variance increases/ decreases with a predictor.
Zero-inflation:
More zeros than expected by the model.
Too small standard error of estimates -> narrower confidence intervals
Larger chance of type I error: find an effect when it doesn’t exist
Detecting dispersion problems with DHARMa
DHARMaScaled quantile residuals -> Simulating from the model
Residuals between 0 and 1 for ANY model complexity or distribution
Interpreted the SAME way:
If your model is correctly specified, i.e. your have the “data-generating process”, scaled quantile residuals will present a uniform “flat” distribution between 0 and 1.
DHARMa
Create DHARMa residuals
Test dispersion problems
Test heteroscedasticity
Test zero inflation
glmmTMB
Overdispersion
Heteroscedasticity
Zero-inflation
Modeling “real” overdispersion
DHARMa nonparametric dispersion test via sd of residuals fitted vs.
simulated
data: simulationOutput
dispersion = 1.1935, p-value = 0.224
alternative hypothesis: two.sided
DHARMa zero-inflation test via comparison to expected zeros with
simulation under H0 = fitted model
data: simulationOutput
ratioObsSim = 0.97326, p-value = 0.848
alternative hypothesis: two.sided
DHARMa nonparametric dispersion test via sd of residuals fitted vs.
simulated
data: simulationOutput
dispersion = 1.9199, p-value < 2.2e-16
alternative hypothesis: two.sided
Modeling heteroscedasticity
DHARMa nonparametric dispersion test via sd of residuals fitted vs.
simulated
data: simulationOutput
dispersion = 1.106, p-value = 0.44
alternative hypothesis: two.sided
DHARMa zero-inflation test via comparison to expected zeros with
simulation under H0 = fitted model
data: simulationOutput
ratioObsSim = 0.98758, p-value = 0.92
alternative hypothesis: two.sided
DHARMa nonparametric dispersion test via sd of residuals fitted vs.
simulated
data: simulationOutput
dispersion = 4.9124, p-value < 2.2e-16
alternative hypothesis: two.sided
Modeling zero-inflation
DHARMa nonparametric dispersion test via sd of residuals fitted vs.
simulated
data: simulationOutput
dispersion = 1.0414, p-value = 0.696
alternative hypothesis: two.sided
DHARMa zero-inflation test via comparison to expected zeros with
simulation under H0 = fitted model
data: simulationOutput
ratioObsSim = 0.99592, p-value = 1
alternative hypothesis: two.sided
Sometimes, residual patterns will not tell you which is the cause of overdispersion. E.g.:
‘Real’ overdispersion will show significant test for zero-inflation, and vice-versa.
‘Real’ overdispersion and zero-inflation may have significant heteroscedasticity/.
DHARMa residuals tools to detect them
glmmTMB
Thank you!
Vielen Dank!